In a future exercise, you will work on pivoting the wide data produced by your sub-setting script from a previous exercise into long format. Once you have that long file, you will need to verify it’s content, clean it up by cleaning rows or by computing or selecting relevant variables. You will also then need to filter data in some way and produce descriptive statistic summaries at very least.
For this exercise, you will read data, select variables to subset the data, filter cases/observations, group, and and summarize the data of a long file.
Data files:
Read “sspan_long_clean.Rds” and assign to an object.
Examine the structure of your data frame using {dplyr} so that you know variables that you might summarize and the variables you might group by. Load your necessary libraries.
Make a note of the “grouping structure”. Also examine for any oddities that you might need to address.
Take the data frame and pipe it to a grouping variable (e.g.,
id_wave) and then pipe to examine the structure. Make note
of how the tibble’s grouping structure has changed.
Add a new variable to the existing data frame that represents the
mean of sspan. Code your function to exclude
NA values when computing the mean. Do not overwrite the
data frame.
Describe what the mean represents:
Using that same grouping structure used earlier (e.g.,
id_wave), group the data and then pipe the data frame and
add a new variable to the data frame that represents the mean of
sspan. Code your function to exclude NA
values. Do not overwrite the data frame.
Describe what the mean represents:
Compare the values of the mean variable added to the data frame with and without the grouping. Is the calculated variable in one of the data frames more similar to what you expected when you mutated the variable? If so, which one?
Using that same grouping structure that you just used, rather than
add a new variable to the data frame that represents the mean of
sspan, summarize the data frame. Code your function to
exclude NA values. Do not overwrite the data frame.
Using that same (a) grouping structure that you just used, (b) add a
new variable to the data frame that represents the mean of
sspan, (c) then summarize the data frame by that
same variable. Code your function to exclude NA
values. Do not overwrite the data frame.
In many cases, you may need to group data frames in one way in order to obtain data summaries which will you will further summarize at a more general level. For example, you may need to aggregate your data in order to obtain average performance at a participant level so that you can further aggregate individuals within a group in service of obtaining group-level summaries.
In order to understand the difference in aggregation techniques, we will group the data two ways.
Take your data frame and (a) group by id_school, and
then (b) summarize the data frame so that your new data frame contains
the mean of sspan at the school level. Do not overwrite the
data frame.
Next, take your data frame and (a) group by
id_school and id_subject, (b) summarize the
data frame so that your new data frame contains the mean of
sspan for each participant in each school, (c) group again
but only by the school, (d) summarize the data frame so that your new
data frame contains the mean of the span variable (whatever you named
it) in the data frame. Do not overwrite the data frame.
Describe what the differences in the summaries and why they exist.
Read “gng_long_clean.Rds” and assign to an object.
Examine the structure of your data frame using {dplyr} so that you know variables that you might summarize and the variables you might group by. Load your necessary libraries.
Make a note of the “grouping structure”. Also examine for any oddities that you might need to address.
Take the data frame and pipe it to a grouping variable (e.g.,
id_wave) and then pipe to examine the structure. Make note
of how the tibble’s grouping structure has changed.
Add a new variable to the existing data frame that represents the
mean of accuracy. Code your function to exclude
NA values when computing the mean. Do not overwrite the
data frame.
Describe what the mean represents:
Using that same grouping structure used earlier (e.g.,
id_wave), group the data and then pipe the data frame and
add a new variable to the data frame that represents the mean of
sspan or accuracy. Code your function to
exclude NA values. Do not overwrite the data frame.
Describe what the mean represents:
Compare the values of the mean variable added to the data frame with and without the grouping. Is the calculated variable in one of the data frames more similar to what you expected when you mutated the variable? If so, which one?
Using that same grouping structure that you just used, rather than
add a new variable to the data frame that represents the mean of
accuracy, summarize the data frame. Code your function to
exclude NA values. Do not overwrite the data frame. Do not
overwrite the data frame.
Using that same (a) grouping structure that you just used, (b) add a
new variable to the data frame that represents the mean of
accuracy, (c) then summarize the data frame by that
same variable. Code your function to exclude NA
values. Do not overwrite the data frame.
In many cases, you may need to group data frames in one way in order to obtain data summaries which will you will further summarize at a more general level. For example, you may need to aggregate your data in order to obtain average performance at a participant level so that you can further aggregate individuals within a group in service of obtaining group-level summaries.
In order to understand the difference in aggregation techniques, we will group the data two ways.
Take your data frame and (a) group by id_school, and
then (b) summarize the data frame so that your new data frame contains
the mean of accuracy at the school level. Do not overwrite
the data frame.
Next, take your data frame and (a) group by
id_school and id_subject, (b) summarize the
data frame so that your new data frame contains the mean of
accuracy for each participant in each school, (c) group
again but only by the school, (d) summarize the data frame so that your
new data frame contains the mean of the accuracy variable (whatever you
named it) in the data frame. Do not overwrite the data frame.
Describe what the differences in the summaries and why they exist.
view_html(GNG)
## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html